Conversation
Introduce the core MPC cluster orchestration infrastructure that replaces Python's MpcCluster. This is the first of two stacked changes -- this one adds the cluster logic with todo!() stubs for the NEAR blockchain and sandbox components (filled in by the next change). Components: - cluster.rs: MpcCluster orchestrator with full setup sequence (sandbox, contract deploy, account creation, attestations, domain addition, node spawning), node lifecycle (kill/start/restart), contract operations (state queries, add_domains, resharing), metrics polling, and data management (wipe_db, block ingestion control). - mpc_node.rs: Updated MpcNode/MpcNodeSetup with metrics scraping (HTTP /metrics), wait_for_metric, set_block_ingestion, reserve_key_event_attempt, migration_state, wipe_db. Decoupled from NearNode -- takes genesis_path, boot_nodes, chain_id directly. - sandbox.rs: NearSandbox stub (todo!() -- Docker implementation in next change) - blockchain.rs: NearBlockchain + DeployedContract stubs (todo!() -- near-kit implementation in next change) Closes #2441
Code ReviewReviewed the full diff: new test infrastructure for MPC cluster orchestration (blockchain, cluster, mpc_node, near_sandbox modules), replacing No critical issues found. The code is well-structured test scaffolding with clean module boundaries. The Minor observations (non-blocking):
✅ Approved |
netrome
left a comment
There was a problem hiding this comment.
Overall looks very good, but I'm not a fan of all the unimplemented!() code paths, that means we can't start the MpcCluster. Feels like we should have started with those components instead and then wired them up in this PR. But anyway. Not a hard blocker.
This also means MpcCluster::start is completely dead code right now, and there are no tests calling it.
| let state = self.nodes.remove(idx); | ||
| let new_state = match state { | ||
| MpcNodeState::Running(node) => MpcNodeState::Stopped(node.kill()), | ||
| MpcNodeState::Stopped(setup) => { | ||
| tracing::warn!(node = idx, "node already stopped"); | ||
| MpcNodeState::Stopped(setup) | ||
| } | ||
| }; | ||
| self.nodes.insert(idx, new_state); |
There was a problem hiding this comment.
Hmm not super happy about this but I don't have any better suggestion right now. I was surprised to learn there's no good non-panicking remove method.
If this was production code I'd advocate for an index check to prevent the potential panic if out of bounds indexes are provided, but for test code I think it's fine.
There was a problem hiding this comment.
Actually this doesn't need to be Result<()> if we don't check the bound, added that check.
| let futures = clients.iter().map(|(i, account, client)| { | ||
| let args = args.clone(); | ||
| let method = method.to_string(); | ||
| async move { | ||
| self.contract | ||
| .call_from(client, &method, args) | ||
| .await | ||
| .with_context(|| format!("node {i} ({account}) failed to call {method}")) | ||
| } | ||
| }); | ||
|
|
||
| futures::future::try_join_all(futures).await?; | ||
| Ok(()) |
gilcu3
left a comment
There was a problem hiding this comment.
LGTM, although without having real tests I would not be surprised if something fails later 🙃
Left some comments, main one being the one about creating signing keys manually, we should avoid doing this type of code unless there is no alternative, and if needed, do it in a centralized place.
| use crate::port_allocator::E2ePortAllocator; | ||
|
|
||
| const DEFAULT_SANDBOX_IMAGE: &str = "nearprotocol/sandbox:2.10.7"; | ||
| const DEFAULT_SANDBOX_IMAGE: &str = "nearprotocol/sandbox:2.11.0-rc.3"; |
There was a problem hiding this comment.
hopefully we never need commits instead of tags here, as has occurred in the past.
There was a problem hiding this comment.
If this becomes a concern, we can support running sandbox without docker too, common functionality should (hopefully) remain same, just process management would be different.
This is the first of two stacked changes -- this one adds the cluster logic with unimplemented!() stubs for the NEAR blockchain (sandbox & client) that are implemented in the followup change.
Notable changes:
Closes #2441